### [Total No. of Questions - 9] [Total No. of Printed Pages - 4] (2125)

#### 15226

# B. Tech 6th Semester Examination Parallel Computing (OS)

## IT-6005

Time: 3 Hours Max. Marks: 100

The candidates shall limit their answers precisely within the answerbook (40 pages) issued to them and no supplementary/continuation sheet will be issued.

Note: Attempt five questions in all selecting one question each from section A, B, C, and D. Section E is compulsory.

#### SECTION - A

- 1. (a) Draw the taxonomy of MIMD computer. Also explain its various parts. (10)
  - (b) Explain the following terms:
    - (i) Grain sizes and Latency.
    - (ii) Instruction level.
    - (iii) Communication Latency.
    - (iv) Grain Packing.

S<sub>s</sub>: Store M((R<sub>2</sub>)), 1024

(v) Scheduling. (10)

/Memory (64)←1024/

2. Analyze the data dependences among the following statements in a given program:

S<sub>1</sub>: Load R<sub>1</sub>, 1024 /R<sub>4</sub>←1024/ S<sub>2</sub>: Load R<sub>2</sub>, M(10) /R<sub>2</sub>←Memory (10)/ S<sub>3</sub>: Add R<sub>1</sub>,R<sub>2</sub>  $/R_1 \leftarrow (R_1) + (R_2)/$ S<sub>4</sub>: Store M(1024), R<sub>1</sub> /Memory (1024)←R<sub>1</sub>/

[P.T.O.]

15226 2

where (R<sub>i</sub>) means the content of register R<sub>i</sub> and memory (10) contains 64 initially.

- Draw a dependence graph to show all dependences.
- Are there any resource dependences, if only one copy of each functional unit is available in the CPU?

#### **SECTION - B**

- Suppose a computer which can execute a program in two operational modes: regular mode versus enhanced mode, with a probability distribution of  $\{\alpha, 1-\alpha\}$  respectively.
  - (i) If α varies between a and b and 0≤a≤b≤1, derive an expression for the average speedup factor using harmonic concept.
  - (ii) Calculate the speedup factor when a→0 and b→1.
  - What is the main problem in Amdahl's law and how Gustafson's law removes that problem?
- 4. Consider a 2-level memory hierarchy, M<sub>1</sub> and M<sub>2</sub>. Let the hit ratio of M<sub>1</sub> be h. Let C<sub>1</sub> and C<sub>2</sub> be the costs per KB, S<sub>1</sub> and S<sub>2</sub> are the memory capacities, and t1 and t2 are the access time respectively.
  - Under what condition will the average cost of the entire memory system approach C<sub>2</sub>?
  - (ii) Let  $r=\frac{t_2}{t_1}$  be the speed ratio of the two memories and  $E = \frac{t_1}{t_a(Memory access time)}$  be the access efficiency of the memory system. Express E in terms of r and h.
  - (iii) What is the required hit ratio to make E>0.98 if r=100?

(20)

15226

(10)

## 3 SECTION - C

- (a) Explain the following terms belonging to vector processing concept:
  - (i) Vectorization ratio.
  - (ii) Gather and Scatter instructions.
  - (iii) Vector and Scalar balance point.
  - (iv) Vectorization compiler.
  - (b) Consider a vector computer which can operate in one of two execution modes at a time: one is the vector mode with an execution rate of R<sub>v</sub>=10M Flops and other is the scalar mode with an execution rate of Rs=1MFlops. Let α be the percentage of code that is vectorizable in a typical program mix for the computer.
    - (i) Derive the expression for the average execution rate (Ra) for this computer system.
    - (ii) Determine the vectorization ratio (α) needed in order to achieve an average execution rate of Ra=7.5M Flops.
- Use two-input AND and OR gates (no wired-OR) to construct an n×n crossbar switch network between n-processors and n memory modules. Let the width of each crosspoint be w bits in each direction.
  - (i) Draw a schematic design of a typical crosspoint switch using C<sub>ij</sub> as the enable signal for the switch in the i<sup>th</sup> row and j<sup>th</sup> column. Calculate the total number of AND and OR gates needed as a function of n and w.
  - (ii) Let k=log<sub>2</sub>n be the address width. Design an arbiter which generates all the crosspoint enable signals C<sub>ij</sub> again using one two-input AND and OR gate if needed. (20)

## SECTION - D

7. (a) What are the various parallel programming model? Explain in brief. (10) [P.T.O.]

4 15226

- b) Distinguish between spin Locks and suspend locks for sole access to a critical section. Also generalize Dekker's protocol from two procedures to three or more procedures sharing critical sections. (10)
- 8. (a) Choose an example program to demonstrate the concept of macrotasking, microtasking and autotasking on a cray-like multiprocessor super computer. Perform a trade-off study on the relative performance of the three multitasking schemes based on the example program execution.

(10)

- (b) Explain the following terms:
  - (i) Optimistic concurrency.
  - (ii) Fairness policies for reviving one of the suspended processes waiting in a queue.
  - (iii) Busy-wait versus sleep-wait protocols. (10)

## SECTION - E

- 9. (a) Write characteristics of vector super computers.
  - (b) Difference between Static scheduling and Dynamic scheduling.
  - (c) Explain the various scalability matrices.
  - (d) Define a term coherence locality and also explain its significance
  - (e) Difference between superscalar processors and vector processors.
  - (f) What is the meaning of parallism profile in programs?
  - (g) What is a cache coherence problem?
  - (h) Difference between Y-MP, Pargon and CM-2 environment.
  - What do you mean by monitor and also explain its applications.
  - (j) Explain, briefly VLSI complexity model. (10×2=20)